Use the ma_lga_12345.csv file for your analysis. Alternatively, you may use raw_sales.csv if you prefer.
Assignment Specifications
Implement at least one recurrent neural network model (e.g., SimpleRNN, LSTM, etc.) to predict Sales.
Important: Since this is a time series task, do not split your data randomly. Maintain the temporal structure in your train-test splits.
One of the goals is to explore the nuances of temporal prediction.
Required Experiments
Train-Test Splits
Experiment with at least three different train-test splits that preserve the chronological order of data.
Historical Data Input
Experiment with at least three different values for the number of historical time steps used as input to the model (e.g., 1-step, 2-step, 3-step history).
Preprocessing may differ depending on the RNN architecture you choose.
Visualization
Plot your results, showing raw data overlaid with predictions.
Code
from keras.src.initializers import GlorotUniform# Data handlingimport pandas as pdimport numpy as np# For reproducibilityimport randomimport tensorflow as tfimport kerasfrom matplotlib import cmfrom sklearn.model_selection import TimeSeriesSplit# Set random seeds for consistencynp.random.seed(100)tf.random.set_seed(100)random.seed(100)seeded_init = GlorotUniform(seed=100)# Visualizationimport matplotlib.pyplot as pltimport seaborn as sns# Preprocessingfrom sklearn.preprocessing import MinMaxScalerfrom sklearn.metrics import mean_squared_error, mean_absolute_error# Deep Learning (TensorFlow / Keras)from keras.models import Sequentialfrom keras.layers import Dense, LSTM, SimpleRNN, GRU, Inputfrom keras.callbacks import EarlyStopping# Utility for time series processingfrom tensorflow.keras.preprocessing.sequence import TimeseriesGenerator# Date handling (if necessary)import datetime
The Data
Code
# Load the datasetdf = pd.read_csv('Data/ma_lga_12345.csv')df = df.sort_values('saledate')# Quick previewprint(df.columns)df.head()
count 147.0
mean 439743.0
std 101158.0
min 316751.0
25% 339172.0
50% 425922.0
75% 533025.0
max 622045.0
Name: MA, dtype: float64
Exploring the Data
Before building our forecasting models, we began by exploring the structure and contents of our dataset. The dataset includes four columns: - saledate: the date the property was sold - MA: the smoothed median sale price, used as our target for prediction - type: whether the property is a house or a unit - bedrooms: the number of bedrooms in the property
We noticed that the data spans from mid-2007 through 2019, with most years containing 28 quarterly entries. This consistency suggests the dataset has been regularly aggregated over time, which is ideal for building time series models.
Next, we examined the distribution of sale prices. For houses, the median smoothed sale price (MA) is around 585,000 while for units it’s lower, around 426,000. As expected, houses tend to sell for more than units, and there’s a reasonable spread in prices across the years. We also noted that there are slightly fewer data points toward the start and end of the time range, which is common in time series when the data is trimmed or just beginning to be collected.
Code
df_house.loc[:, 'saledate'] = pd.to_datetime(df_house['saledate'], dayfirst=True)df_house = df_house.sort_values('saledate')# Set figureplt.figure(figsize=(12, 6))# Loop through each bedroom countfor bedroom, group in df_house.groupby('bedrooms'): plt.plot(group['saledate'], group['MA'], label=f'{bedroom} BR')plt.xlabel('Date')plt.ylabel('Moving Average (MA)')plt.title('House MA Over Time Colored by Bedroom Count')plt.legend(title='Bedrooms')plt.grid(True)plt.tight_layout()plt.show()
Code
df_unit.loc[:, 'saledate'] = pd.to_datetime(df_unit['saledate'], dayfirst=True)df_unit = df_unit.sort_values('saledate')# Set figureplt.figure(figsize=(12, 6))# Loop through each bedroom countfor bedroom, group in df_unit.groupby('bedrooms'): plt.plot(group['saledate'], group['MA'], label=f'{bedroom} BR')plt.xlabel('Date')plt.ylabel('Moving Average (MA)')plt.title('Unit MA Over Time Colored by Bedroom Count')plt.legend(title='Bedrooms')plt.grid(True)plt.tight_layout()plt.show()
Visualizing Property Prices Over Time by Bedroom Count
To better understand the trends in our data, we visualized the smoothed median property prices (MA) over time, grouped by the number of bedrooms.
In the first plot, we focus on houses. As expected, houses with more bedrooms consistently sell for higher prices. The lines show a steady upward trend over time, particularly for 4-bedroom and 5-bedroom homes, which saw the steepest price growth between 2015 and 2018. This suggests strong demand for larger homes in recent years. Even 2- and 3-bedroom houses show growth, but at a slower rate and lower price level.
In the second plot, we examine units. These follow a similar pattern, though at a lower price level overall. Larger units (with 3 bedrooms) tend to have higher prices, but unlike houses, price growth for smaller units appears to have leveled off or even declined slightly after 2016. This could indicate saturation or shifting preferences in the unit market.
Together, these plots help us confirm that: - Price behavior varies between houses and units, and their sizes - Time-based patterns are clearly present and meaningful
Modeling
Preparing the Data for the Neural Network
Time series models need structured input to learn from past patterns. The create_sequences function transforms our timeline of values into smaller chunks (or “windows”) of past data. For example, if we set a history length of 3, the model will learn to predict the next value based on the previous 3 values. This step is essential to teaching the model how past property prices relate to future prices.
Code
def create_sequences(data, steps): X, y = [], []for i inrange(steps, len(data)): X.append(data[i-steps:i]) y.append(data[i])return np.array(X), np.array(y)
Building and Running Our Forecasting Model
This function handles the full forecasting process for one specific group (such as 1-bedroom units). First, it prepares the data, then it trains a neural network model (either a basic RNN or an LSTM, which is a type of RNN that’s good at handling sequences). The model learns how past house prices relate to future prices. After training, the model’s predictions are compared to actual values using a standard error measure (RMSE), and the results are visualized over time. This helps us see how well the model performs.
This function helps us compare model performance under different conditions. It runs multiple forecasts using different amounts of training data (split ratios) and different lengths of past history (number of quarters used to predict the next). Each combination is plotted in a separate panel so we can easily spot trends in how the model performs under each setting. This lets us explore which setup gives us the most reliable predictions.
RNN models performed reasonably well across all settings. The best performance was observed when using longer history windows (3 or 5 quarters) and larger training sizes. For example, using 5 steps of history and training on 70% of the data produced relatively low test RMSE values (~7133), and predictions tracked the actual trend smoothly.
Observations on LSTMs
LSTM results were more mixed. While LSTMs generally have the potential to outperform RNNs in time series tasks, they tended to overfit or struggle with very small training windows or short history sizes in our setup. Some test errors exceeded 60,000, especially for short histories and larger splits. One of the more stable results came from a 5-step history and a 60% train size, with test RMSE around 26,400 — still notably higher than the best RNN run.
Key Takeaways
Using more history helps: Larger input windows generally improved predictions for both RNN and LSTM models.
More training data improves stability, but for LSTMs, too large a split (e.g. 80%) sometimes led to poor generalization.
RNNs outperformed LSTMs in this specific case, suggesting that for simpler, smoother time series like housing MA, complexity does not always guarantee better results.
The RNN models showed strong performance across most settings, with visibly tight alignment between predicted and actual values — especially with longer history windows (3 or 5 steps). The lowest test error occurred using a 3-step history and an 80% training split, with a test RMSE of just 3,821, indicating highly accurate predictions. Even smaller splits performed reasonably well with errors mostly under 25,000.
LSTM Results
The LSTM models struggled more significantly. In some settings, particularly with smaller history windows, the test error spiked dramatically. For example, a 1-step history with 80% training data yielded a test RMSE over 164,000. That’s a clear indication the model failed to generalize. However, there were exceptions: the LSTM with a 5-step history and 60% training data gave a test RMSE of 40,606, showing some promise when more past context was provided.
Key Takeaways
For 3-bedroom houses, Simple RNNs were far more stable and accurate than LSTMs.
Increasing the amount of history consistently improved performance for both models.
LSTMs tended to overfit or underperform in this context, possibly due to the relatively small dataset and the smoothed nature of the MA series.
These findings suggest that for straightforward housing trends, simpler RNNs may be more reliable and interpretable than deeper models like LSTMs.
Simple RNN models again delivered strong results. All configurations with longer histories (3 or 5 past quarters) performed significantly better than those with only 1 history step. For example, using 3 history steps and 70% training data produced a test RMSE of just 17,762, and predictions followed actual trends very closely. Even the 5-step models across all splits remained consistent, with test RMSEs between 20,000 and 26,000.
In contrast, the 1-step history models showed much higher errors, especially at lower train ratios. Still, even those models (e.g., RMSE ≈ 49,644 at 80% training) were notably better than the comparable LSTM results.
LSTM Results
LSTM models once again showed instability, especially with short history lengths. For instance, a 1-step model with 80% training yielded a massive test RMSE of over 260,000. Increasing the history steps to 3 or 5 helped slightly in some cases, but overall test performance remained weak and erratic. The lowest test error from an LSTM was 97,742 — still several times worse than the best RNN result.
Key Takeaways
RNNs consistently outperformed LSTMs for this group of properties, across all tested configurations.
More history = better performance: As seen with 2BR and 3BR homes, using more quarters of past data improves model learning.
LSTMs did not generalize well on this data, possibly due to overfitting or sensitivity to the smoothed nature of the MA feature.
At this point in our analysis, it is increasingly clear that Simple RNNs are more appropriate for this kind of problem. The data’s regularity and lack of high-frequency noise may favor simpler architectures that don’t require as much regularization or memory management as LSTMs provide.
Simple RNNs continued to show solid performance. With a longer historical context (e.g., 3 or 5 quarters), models were able to follow the actual trend closely — especially at 70% and 80% training splits. The best result came from the 5-step RNN with a 70% training split, which yielded a test RMSE of just 26,590. Other models in this family also stayed within acceptable error margins, especially compared to their LSTM counterparts.
Shorter history windows (1 step) were again less reliable. For instance, the same 70% training split with only 1 step produced a much larger RMSE of over 114,000, reinforcing the pattern seen in other bedroom categories.
LSTM Results
LSTM models did not perform as well for 5-bedroom homes. The smallest history windows led to extremely poor generalization, with test RMSE values soaring beyond 250,000 in several configurations. Even the best LSTM result (RMSE ≈ 111,884) was about four times worse than the best RNN model for this property group.
These results may reflect LSTM overfitting due to the relatively low granularity and volume of the input data. Unlike natural language or financial tick data, this dataset has smooth trends with fewer inflection points, which are better captured by simpler architectures.
Takeaways
For larger, higher-priced homes, Simple RNNs still outperformed LSTMs across the board.
History steps of 3 to 5 were optimal for learning from past price trends.
LSTMs continue to struggle with this type of temporal data — especially when the data is smoothed or aggregated, as in this quarterly MA series.
Increased history steps: from 2429.75 (History=1, Train=60%) down to 1179.53 (History=5, Train=80%)
Increased training size
Why this worked: The 1BR price data is cyclical and smooth. RNNs benefitted from longer memory (history=5) to capture patterns like price dips and rebounds. Increasing the training set allowed better generalization without overfitting.
LSTM
Best configuration: History=5, Train=80%
Test RMSE: 11393.14
LSTM models consistently underperformed RNNs. For example:
History=1, Train=60%: LSTM Test RMSE = 14323.36 vs. RNN = 2429.75
History=3, Train=80%: LSTM Test RMSE = 6629.13 vs. RNN = 1260.64
Why this failed: LSTMs are more complex and require larger datasets. Here, they overfit to training data and failed to learn meaningful test patterns, resulting in flat predictions and poor test RMSE.
Two-Bedroom Units
SimpleRNN
Best configuration: History=3, Train=70%
Test RMSE: 587.17
Notable patterns:
History=3 was optimal across splits
Test RMSEs remained under 4000 for most configurations with history ≥ 3
Why this worked: Two-bedroom unit prices followed a long-term growth curve with leveling off near 2016–2017. History steps of 3–5 were sufficient to model this trajectory. The 70% split gave enough history without truncating the test window.
LSTM
Best configuration: History=5, Train=80%
Test RMSE: 4887.84
Most LSTM setups resulted in very high test RMSEs:
History=3, Train=60%: Test RMSE = 72077.13
History=5, Train=70%: Test RMSE = 75743.20
Why this failed: LSTMs again overfitted and underestimated trends. Some models predicted nearly flat lines in the test period, indicating they failed to extrapolate the underlying upward price curve.
Three-Bedroom Units
SimpleRNN
Best configuration: History=3, Train=80%
Test RMSE: 2233.07
Strong performers:
History=5, Train=80%: Test RMSE = 8292.40
History=3, Train=70%: Test RMSE = 5877.09
Why this worked: RNNs learned price growth and deceleration well when given sufficient history and training data. History=3 provided a balance between recent trend memory and model flexibility.
LSTM
Best configuration: History=3, Train=80%
Test RMSE: 2582.77
Stronger than LSTM results on 1BR and 2BR:
History=1, Train=60%: Test RMSE = 56725.48
History=5, Train=70%: Test RMSE = 14896.99
Why this worked (relatively): LSTMs performed better here, likely due to the more pronounced upward trend in 3BR unit prices post-2012. Still, SimpleRNN was more stable and accurate overall.
Final Takeaways
SimpleRNN consistently outperformed LSTM across all unit sizes, especially in smaller data regimes.
History steps of 3 or 5 were nearly always better than 1, suggesting that short-term memory is insufficient to learn housing trends.
LSTMs struggled due to overfitting and under-learning test patterns, often outputting flat or misaligned curves.
Optimal configurations involved a 70% or 80% training split with longer input sequences (history=3 to 5), which allowed models to generalize better on unseen data.